Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 952
Filtrar
1.
BMC Bioinformatics ; 25(1): 165, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38664627

RESUMO

BACKGROUND: The annotation of protein sequences in public databases has long posed a challenge in molecular biology. This issue is particularly acute for viral proteins, which demonstrate limited homology to known proteins when using alignment, k-mer, or profile-based homology search approaches. A novel methodology employing Large Language Models (LLMs) addresses this methodological challenge by annotating protein sequences based on embeddings. RESULTS: Central to our contribution is the soft alignment algorithm, drawing from traditional protein alignment but leveraging embedding similarity at the amino acid level to bypass the need for conventional scoring matrices. This method not only surpasses pooled embedding-based models in efficiency but also in interpretability, enabling users to easily trace homologous amino acids and delve deeper into the alignments. Far from being a black box, our approach provides transparent, BLAST-like alignment visualizations, combining traditional biological research with AI advancements to elevate protein annotation through embedding-based analysis while ensuring interpretability. Tests using the Virus Orthologous Groups and ViralZone protein databases indicated that the novel soft alignment approach recognized and annotated sequences that both blastp and pooling-based methods, which are commonly used for sequence annotation, failed to detect. CONCLUSION: The embeddings approach shows the great potential of LLMs for enhancing protein sequence annotation, especially in viral genomics. These findings present a promising avenue for more efficient and accurate protein function inference in molecular biology.


Assuntos
Algoritmos , Anotação de Sequência Molecular , Alinhamento de Sequência , Anotação de Sequência Molecular/métodos , Alinhamento de Sequência/métodos , Proteínas Virais/genética , Proteínas Virais/química , Genes Virais , Bases de Dados de Proteínas , Biologia Computacional/métodos , Sequência de Aminoácidos
2.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38640488

RESUMO

MOTIVATION: The ENCODE project generated a large collection of eCLIP-seq RNA binding protein (RBP) profiling data with accompanying RNA-seq transcriptomes of shRNA knockdown of RBPs. These data could have utility in understanding the functional impact of genetic variants, however their potential has not been fully exploited. We implement INCA (Integrative annotation scores of variants for impact on RBP activities) as a multi-step genetic variant scoring approach that leverages the ENCODE RBP data together with ClinVar and integrates multiple computational approaches to aggregate evidence. RESULTS: INCA evaluates variant impacts on RBP activities by leveraging genotypic differences in cell lines used for eCLIP-seq. We show that INCA provides critical specificity, beyond generic scoring for RBP binding disruption, for candidate variants and their linkage-disequilibrium partners. As a result, it can, on average, augment scoring of 46.2% of the candidate variants beyond generic scoring for RBP binding disruption and aid in variant prioritization for follow-up analysis. AVAILABILITY AND IMPLEMENTATION: INCA is implemented in R and is available at https://github.com/keleslab/INCA.


Assuntos
Proteínas de Ligação a RNA , Humanos , Proteínas de Ligação a RNA/metabolismo , Proteínas de Ligação a RNA/genética , Software , Variação Genética , Biologia Computacional/métodos , Anotação de Sequência Molecular/métodos
3.
J Mol Biol ; 436(4): 168416, 2024 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-38143020

RESUMO

Neuropeptides not only work through nervous system but some of them also work peripherally to regulate numerous physiological processes. They are important in regulation of numerous physiological processes including growth, reproduction, social behavior, inflammation, fluid homeostasis, cardiovascular function, and energy homeostasis. The various roles of neuropeptides make them promising candidates for prospective therapeutics of different diseases. Currently, NeuroPep has been updated to version 2.0, it now holds 11,417 unique neuropeptide entries, which is nearly double of the first version of NeuroPep. When available, we collected information about the receptor for each neuropeptide entry and predicted the 3D structures of those neuropeptides without known experimental structure using AlphaFold2 or APPTEST according to the peptide sequence length. In addition, DeepNeuropePred and NeuroPred-PLM, two neuropeptide prediction tools developed by us recently, were also integrated into NeuroPep 2.0 to help to facilitate the identification of new neuropeptides. NeuroPep 2.0 is freely accessible at https://isyslab.info/NeuroPepV2/.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Neuropeptídeos , Sequência de Aminoácidos , Neuropeptídeos/química , Anotação de Sequência Molecular/métodos
4.
J Biol Chem ; 299(9): 105130, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37543366

RESUMO

Long noncoding RNAs (lncRNAs) are increasingly being recognized as modulators in various biological processes. However, due to their low expression, their systematic characterization is difficult to determine. Here, we performed transcript annotation by a newly developed computational pipeline, termed RNA-seq and small RNA-seq combined strategy (RSCS), in a wide variety of cellular contexts. Thousands of high-confidence potential novel transcripts were identified by the RSCS, and the reliability of the transcriptome was verified by analysis of transcript structure, base composition, and sequence complexity. Evidenced by the length comparison, the frequency of the core promoter and the polyadenylation signal motifs, and the locations of transcription start and end sites, the transcripts appear to be full length. Furthermore, taking advantage of our strategy, we identified a large number of endogenous retrovirus-associated lncRNAs, and a novel endogenous retrovirus-lncRNA that was functionally involved in control of Yap1 expression and essential for early embryogenesis was identified. In summary, the RSCS can generate a more complete and precise transcriptome, and our findings greatly expanded the transcriptome annotation for the mammalian community.


Assuntos
Anotação de Sequência Molecular , RNA Longo não Codificante , RNA-Seq , Animais , Desenvolvimento Embrionário/genética , Mamíferos/embriologia , Mamíferos/genética , Anotação de Sequência Molecular/métodos , Regiões Promotoras Genéticas/genética , Reprodutibilidade dos Testes , Retroviridae/genética , RNA Longo não Codificante/genética , RNA-Seq/métodos , Sítio de Iniciação de Transcrição , Transcriptoma/genética , Proteínas de Sinalização YAP/genética , Proteínas de Sinalização YAP/metabolismo
5.
Genome Biol ; 24(1): 135, 2023 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-37291671

RESUMO

BACKGROUND: In every living species, the function of a protein depends on its organization of structural domains, and the length of a protein is a direct reflection of this. Because every species evolved under different evolutionary pressures, the protein length distribution, much like other genomic features, is expected to vary across species but has so far been scarcely studied. RESULTS: Here we evaluate this diversity by comparing protein length distribution across 2326 species (1688 bacteria, 153 archaea, and 485 eukaryotes). We find that proteins tend to be on average slightly longer in eukaryotes than in bacteria or archaea, but that the variation of length distribution across species is low, especially compared to the variation of other genomic features (genome size, number of proteins, gene length, GC content, isoelectric points of proteins). Moreover, most cases of atypical protein length distribution appear to be due to artifactual gene annotation, suggesting the actual variation of protein length distribution across species is even smaller. CONCLUSIONS: These results open the way for developing a genome annotation quality metric based on protein length distribution to complement conventional quality measures. Overall, our findings show that protein length distribution between living species is more uniform than previously thought. Furthermore, we also provide evidence for a universal selection on protein length, yet its mechanism and fitness effect remain intriguing open questions.


Assuntos
Anotação de Sequência Molecular , Proteínas , Análise de Sequência de Proteína , Sequência de Aminoácidos , Anotação de Sequência Molecular/métodos , Proteínas/química , Proteínas/classificação , Proteoma , Análise de Sequência de Proteína/métodos , Eucariotos , Bactérias , Archaea
6.
Science ; 380(6643): eabn3107, 2023 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-37104600

RESUMO

Annotating coding genes and inferring orthologs are two classical challenges in genomics and evolutionary biology that have traditionally been approached separately, limiting scalability. We present TOGA (Tool to infer Orthologs from Genome Alignments), a method that integrates structural gene annotation and orthology inference. TOGA implements a different paradigm to infer orthologous loci, improves ortholog detection and annotation of conserved genes compared with state-of-the-art methods, and handles even highly fragmented assemblies. TOGA scales to hundreds of genomes, which we demonstrate by applying it to 488 placental mammal and 501 bird assemblies, creating the largest comparative gene resources so far. Additionally, TOGA detects gene losses, enables selection screens, and automatically provides a superior measure of mammalian genome quality. TOGA is a powerful and scalable method to annotate and compare genes in the genomic era.


Assuntos
Eutérios , Genômica , Anotação de Sequência Molecular , Animais , Feminino , Camundongos , Eutérios/genética , Genoma , Genômica/métodos , Anotação de Sequência Molecular/métodos , Aves/genética
7.
Science ; 379(6639): 1358-1363, 2023 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-36996195

RESUMO

Enzyme function annotation is a fundamental challenge, and numerous computational tools have been developed. However, most of these tools cannot accurately predict functional annotations, such as enzyme commission (EC) number, for less-studied proteins or those with previously uncharacterized functions or multiple activities. We present a machine learning algorithm named CLEAN (contrastive learning-enabled enzyme annotation) to assign EC numbers to enzymes with better accuracy, reliability, and sensitivity compared with the state-of-the-art tool BLASTp. The contrastive learning framework empowers CLEAN to confidently (i) annotate understudied enzymes, (ii) correct mislabeled enzymes, and (iii) identify promiscuous enzymes with two or more EC numbers-functions that we demonstrate by systematic in silico and in vitro experiments. We anticipate that this tool will be widely used for predicting the functions of uncharacterized enzymes, thereby advancing many fields, such as genomics, synthetic biology, and biocatalysis.


Assuntos
Enzimas , Aprendizado de Máquina , Anotação de Sequência Molecular , Proteínas , Análise de Sequência de Proteína , Algoritmos , Biologia Computacional , Enzimas/química , Genômica , Proteínas/química , Reprodutibilidade dos Testes , Anotação de Sequência Molecular/métodos , Análise de Sequência de Proteína/métodos , Biocatálise
8.
Sci Rep ; 13(1): 1417, 2023 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-36697464

RESUMO

We report here a new application, CustomProteinSearch (CusProSe), whose purpose is to help users to search for proteins of interest based on their domain composition. The application is customizable. It consists of two independent tools, IterHMMBuild and ProSeCDA. IterHMMBuild allows the iterative construction of Hidden Markov Model (HMM) profiles for conserved domains of selected protein sequences, while ProSeCDA scans a proteome of interest against an HMM profile database, and annotates identified proteins using user-defined rules. CusProSe was successfully used to identify, in fungal genomes, genes encoding key enzyme families involved in secondary metabolism, such as polyketide synthases (PKS), non-ribosomal peptide synthetases (NRPS), hybrid PKS-NRPS and dimethylallyl tryptophan synthases (DMATS), as well as to characterize distinct terpene synthases (TS) sub-families. The highly configurable characteristics of this application makes it a generic tool, which allows the user to refine the function of predicted proteins, to extend detection to new enzymes families, and may also be applied to biological systems other than fungi and to other proteins than those involved in secondary metabolism.


Assuntos
Fungos , Anotação de Sequência Molecular , Metabolismo Secundário , Software , Sequência de Aminoácidos , Anotação de Sequência Molecular/métodos , Peptídeo Sintases/genética , Policetídeo Sintases/genética , Metabolismo Secundário/genética , Fungos/enzimologia , Fungos/genética , Triptofano Sintase/genética , Sequência Conservada/genética
9.
Nucleic Acids Res ; 50(W1): W57-W65, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35640593

RESUMO

The Annotation Query (AnnoQ) (http://annoq.org/) is designed to provide comprehensive and up-to-date functional annotations for human genetic variants. The system is supported by an annotation database with ∼39 million human variants from the Haplotype Reference Consortium (HRC) pre-annotated with sequence feature annotations by WGSA and functional annotations to Gene Ontology (GO) and pathways in PANTHER. The database operates on an optimized Elasticsearch framework to support real-time complex searches. This implementation enables users to annotate data with the most up-to-date functional annotations via simple queries instead of setting up individual tools. A web interface allows users to interactively browse the annotations, annotate variants and search variant data. Its easy-to-use interface and search capabilities are well-suited for scientists with fewer bioinformatics skills such as bench scientists and statisticians. AnnoQ also has an API for users to access and annotate the data programmatically. Packages for programming languages, such as the R package, are available for users to embed the annotation queries in their scripts. AnnoQ serves researchers with a wide range of backgrounds and research interests as an integrated annotation platform.


Assuntos
Variação Genética , Anotação de Sequência Molecular , Software , Humanos , Bases de Dados Genéticas , Internet , Anotação de Sequência Molecular/métodos , Interface Usuário-Computador , Variação Genética/genética , Haplótipos/genética , Linguagens de Programação
10.
Genomics Proteomics Bioinformatics ; 20(3): 455-465, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-34954426

RESUMO

Exploring the genetic basis of human infertility is currently under intensive investigation. However, only a handful of genes have been validated in animal models as disease-causing genes in infertile men. Thus, to better understand the genetic basis of human spermatogenesis and bridge the knowledge gap between humans and other animal species, we construct the FertilityOnline, a database integrating the literature-curated functional genes during spermatogenesis into an existing spermatogenic database, SpermatogenesisOnline 1.0. Additional features, including the functional annotation and genetic variants of human genes, are also incorporated into FertilityOnline. By searching this database, users can browse the functional genes involved in spermatogenesis and instantly narrow down the number of candidates of genetic mutations underlying male infertility in a user-friendly web interface. Clinical application of this database was exampled by the identification of novel causative mutations in synaptonemal complex central element protein 1 (SYCE1) and stromal antigen 3 (STAG3) in azoospermic men. In conclusion, FertilityOnline is not only an integrated resource for spermatogenic genes but also a useful tool facilitating the exploration of the genetic basis of male infertility. FertilityOnline can be freely accessed at http://mcg.ustc.edu.cn/bsc/spermgenes2.0/index.html.


Assuntos
Análise Mutacional de DNA , Bases de Dados Genéticas , Infertilidade Masculina , Anotação de Sequência Molecular , Espermatogênese , Humanos , Masculino , Proteínas de Ciclo Celular/genética , Infertilidade Masculina/genética , Anotação de Sequência Molecular/métodos , Mutação , Análise Mutacional de DNA/métodos , Espermatogênese/genética , Sistemas On-Line
11.
Gene ; 807: 145952, 2022 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-34500049

RESUMO

Extreme temperature is one of the serious threats to crop production in present and future scenarios of global climate changes. Lentil (Lens culinaris) is an important crop, and there is a serious lack of genetic information regarding environmental and temperature stresses responses. This study is the first report of evaluation of key genes and molecular mechanisms related to temperature stresses in lentil using the RNA sequencing technique. De novo transcriptome assembly created 44,673 contigs and differential gene expression analysis revealed 7494 differentially expressed genes between the temperature stresses and control group. Basic annotation of generated transcriptome assembly in our study led to the identification of 2765 novel transcripts that have not been identified yet in lentil genome draft v1.2. In addition, several unigenes involved in mechanisms of temperature sensing, calcium and hormone signaling and DNA-binding transcription factor activity were identified. Also, common mechanisms in response to temperature stresses, including the proline biosynthesis, the photosynthetic light reactions balancing, chaperone activity and circadian rhythms, are determined by the hub genes through the protein-protein interaction networks analysis. Deciphering the mechanisms of extreme temperature tolerance would be a new way for developing crops with enhanced plasticity against climate change. In general, this study has identified set of mechanisms and various genes related to cold and heat stresses which will be useful in better understanding of the lentil's reaction to temperature stresses.


Assuntos
Lens (Planta)/crescimento & desenvolvimento , Lens (Planta)/genética , Estresse Fisiológico/genética , Mudança Climática , Temperatura Baixa/efeitos adversos , Resposta ao Choque Frio/genética , Produtos Agrícolas/genética , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica de Plantas/genética , Resposta ao Choque Térmico/genética , Resposta ao Choque Térmico/fisiologia , Temperatura Alta/efeitos adversos , Anotação de Sequência Molecular/métodos , Fotossíntese , Mapas de Interação de Proteínas/genética , Temperatura , Transcriptoma/genética
12.
Gene ; 808: 145996, 2022 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-34634440

RESUMO

Russula griseocarnosa is a well-known ectomycorrhizal mushroom, which is mainly distributed in the Southern China. Although several scholars have attempted to isolate and cultivate fungal strains, no accurate method for culture of artificial fruiting bodies has been presented owing to difficulties associated with mycelium growth on artificial media. Herein, we sequenced R. griseocarnosa genome using the second- and third-generation sequencing technologies, followed by de novo assembly of high-throughput sequencing reads, and GeneMark-ES, BLAST, CAZy, and other databases were utilized for functional gene annotation. We also constructed a phylogenetic tree using different species of fungi, and also conducted comparative genomics analysis of R. griseocarnosa against its four representative species. In addition, we evaluated the accuracy of one already sequenced genome of R. griseocarnosa based on the internal transcribed spacer (ITS) sequencing of that type of species. The assembly process resulted in identification of 230 scaffolds with a total genome size of 50.67 Mbp. The gene prediction showed that R. griseocarnosa genome included 14,229 coding sequences (CDs). In addition, 470 RNAs were predicted with 155 transfer RNAs (tRNAs), 49 ribosomal RNAs (rRNAs), 41 small noncoding RNAs (sRNAs), 42 small nuclear RNAs (snRNAs), and 183 microRNAs (miRNAs). The predicted protein sequences of R. griseocarnosa were analyzed to indicate the existence of carbohydrate-active enzymes (CAZymes), and the results revealed that 153 genes encoded CAZymes, which were distributed in 58 CAZyme families. These enzymes included 78 glycoside hydrolases (GHs), 34 glycosyl transferases (GTs), 30 auxiliary activities (AAs), 2 carbohydrate esterases (CEs), 8 carbohydrate-binding modules (CBMs), and only one polysaccharide lyase (PL). Compared with other fungi, R. griseocarnosa had fewer CAZymes, and the number and distribution of CAZymes were similar to other mycorrhizal fungi, such as Tricholoma matsutake and Suillus luteus. Well-defined effector proteins that were associated with mycorrhiza-induced small-secreted proteins (MiSSPs) were not found in R. griseocarnosa, which indicated that there may be some special effector proteins to interact with host plants in R. griseocarnosa. The genome of R. griseocarnosa may provide new insights into the energy metabolism of ectomycorrhizal (ECM) fungi, a reference to study ecosystem and evolutionary diversification of R. griseocarnosa, as well as promoting the study of artificial domestication.


Assuntos
Basidiomycota/genética , Basidiomycota/metabolismo , Agaricales/genética , China , Genoma Fúngico/genética , Genômica/métodos , Anotação de Sequência Molecular/métodos , Micorrizas/genética , Micorrizas/metabolismo , Filogenia , Sequenciamento Completo do Genoma/métodos
13.
PLoS Biol ; 19(12): e3001464, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34871295

RESUMO

The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.


Assuntos
Crowdsourcing/métodos , Curadoria de Dados/métodos , Anotação de Sequência Molecular/métodos , Sequência de Aminoácidos/genética , Biologia Computacional/métodos , Bases de Dados de Proteínas/tendências , Humanos , Literatura , Proteínas/metabolismo , Participação dos Interessados
15.
J Am Soc Mass Spectrom ; 32(11): 2644-2654, 2021 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-34633184

RESUMO

Enhanced in-source fragmentation/annotation (EISA) has recently been shown to produce fragment ions that match tandem mass spectrometry data across a wide range of small molecules. EISA has been developed to facilitate data-dependent acquisition (DDA), data-independent acquisiton (DIA), and multiple-reaction monitoring (MRM), enabling molecular identifications in untargeted metabolomics and targeted quantitative single-quadrupole MRM (Q-MRM) analyses. Here, EISA has been applied to peptide-based proteomic analysis using optimized in-source fragmentation to generate fragmentation patterns for a mixture of 38 peptides, which were comparable to the b- and y-type fragment ions typically observed in tandem MS experiments. The optimal in-source fragmentation conditions at which high-abundance peptide fragments and precursor ions coexist were compared with automated data-dependent acquisition (DDA) in the same quadrupole time-of-flight (QTOF-MS) mass spectrometer, generating a significantly higher fragment percentage of peptides from both singly and doubly charged b- and y-type fragment (b+, y+, b2+, and y2+) ions. Higher fragment percentages were also observed for these fragment ion series over linear ion trap instrumentation. An XCMS-EISA annotation/deconvolution program was developed, making use of the retention time and peak shape continuity between precursor fragment ions, to perform automated proteomic data analysis on the enhanced in-source fragments. Post-translational modification (PTM) characterization on peptides was demonstrated with EISA, producing fragment ions corresponding to a neutral loss of phosphoric acid with greater intensity than observed with DDA on a QTOF-MS. Moreover, Q-MRM demonstrated the ability to use EISA for peptide quantification. The availability of more sophisticated in-source fragmentation informatics, beyond XCMS-EISA, will further enable EISA for sensitive autonomous identification and Q-MRM quantitative analyses in proteomics.


Assuntos
Anotação de Sequência Molecular/métodos , Fragmentos de Peptídeos/análise , Fragmentos de Peptídeos/química , Proteômica/métodos , Íons/análise , Íons/química , Sensibilidade e Especificidade
16.
STAR Protoc ; 2(4): 100888, 2021 12 17.
Artigo em Inglês | MEDLINE | ID: mdl-34704076

RESUMO

Annotating protein-coding genes can be challenging, especially when searching for the best hits against multiple functional databases. This is partly because of "bad words" appearing as top hits, such as hypothetical or uncharacterized proteins. To help alleviate some of these issues, we designed a bioinformatics tool called NoBadWordsCombiner, which efficiently merges the hits from various databases, strengthening gene definitions by minimizing functional descriptions containing "bad words." Unlike other available tools, NoBadWordsCombiner is user friendly, but it does require users to have some general bioinformatics skills, including a basic understanding of the BLAST package and dash shell in Linux/Unix environments. For complete details on the use and execution of this protocol, please refer to Zhang et al. (2021a).


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Anotação de Sequência Molecular , Alinhamento de Sequência/métodos , Software , Animais , Humanos , Camundongos , Anotação de Sequência Molecular/métodos , Proteínas/genética
17.
PLoS Comput Biol ; 17(10): e1009463, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34710081

RESUMO

Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.


Assuntos
Crowdsourcing/métodos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Biologia Computacional , Bases de Dados Genéticas , Humanos , Proteínas/genética , Proteínas/fisiologia
18.
J Parasitol ; 107(5): 799-809, 2021 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-34648630

RESUMO

Taenia solium cysts were collected from pig skeletal muscle and analyzed via a shotgun proteomic approach to identify known proteins in the cyst fluid and to explore host-parasite interactions. Cyst fluid was aseptically collected and analyzed with shotgun liquid chromatography-tandem mass spectrometry (LC-MS/MS). Gene alignment and annotation were performed using Blast2GO software followed by gene ontology analysis of the annotated proteins. The pathways were further analyzed with the Kyoto Encyclopedia of Genes and Genomes (KEGG), and a protein-protein interaction (PPI) network map was generated using STRING software. A total of 158 known proteins were identified, most of which were low-molecular-mass proteins. These proteins were mainly involved in cellular and metabolic processes, and their molecular functions were predominantly related to catalytic activity and binding functions. The pathway enrichment analysis revealed that the known proteins were mainly enriched in the PI3K-Akt and glycolysis/gluconeogenesis signaling pathways. The nodes in the PPI network mainly consisted of enzymes involved in sugar metabolism. The cyst fluid proteins screened in this study may play important roles in the interaction between the cysticerci and the host. The shotgun LC-MS/MS, gene ontology, KEGG, and PPI network map data will be used to identify and analyze the cyst fluid proteome of cysticerci, which will provide a basis for further exploration of the invasion and activities of T. solium.


Assuntos
Proteínas de Helminto/análise , Proteômica/métodos , Taenia solium/química , Animais , Cromatografia Líquida , Proteínas de Helminto/classificação , Proteínas de Helminto/genética , Proteínas de Helminto/metabolismo , Interações Hospedeiro-Parasita , Anotação de Sequência Molecular/métodos , Peso Molecular , Músculo Esquelético/parasitologia , Mapas de Interação de Proteínas , Alinhamento de Sequência , Transdução de Sinais , Suínos , Taenia solium/genética , Espectrometria de Massas em Tandem
19.
PLoS Genet ; 17(10): e1009768, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34648488

RESUMO

Transposable elements (TEs) constitute the majority of flowering plant DNA, reflecting their tremendous success in subverting, avoiding, and surviving the defenses of their host genomes to ensure their selfish replication. More than 85% of the sequence of the maize genome can be ascribed to past transposition, providing a major contribution to the structure of the genome. Evidence from individual loci has informed our understanding of how transposition has shaped the genome, and a number of individual TE insertions have been causally linked to dramatic phenotypic changes. Genome-wide analyses in maize and other taxa have frequently represented TEs as a relatively homogeneous class of fragmentary relics of past transposition, obscuring their evolutionary history and interaction with their host genome. Using an updated annotation of structurally intact TEs in the maize reference genome, we investigate the family-level dynamics of TEs in maize. Integrating a variety of data, from descriptors of individual TEs like coding capacity, expression, and methylation, as well as similar features of the sequence they inserted into, we model the relationship between attributes of the genomic environment and the survival of TE copies and families. In contrast to the wholesale relegation of all TEs to a single category of junk DNA, these differences reveal a diversity of survival strategies of TE families. Together these generate a rich ecology of the genome, with each TE family representing the evolution of a distinct ecological niche. We conclude that while the impact of transposition is highly family- and context-dependent, a family-level understanding of the ecology of TEs in the genome can refine our ability to predict the role of TEs in generating genetic and phenotypic diversity.


Assuntos
Elementos de DNA Transponíveis/genética , Genoma de Planta/genética , Zea mays/genética , Ecossistema , Evolução Molecular , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos
20.
PLoS Comput Biol ; 17(10): e1009423, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34648491

RESUMO

Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.


Assuntos
Algoritmos , Cromatina/genética , Genoma/genética , Genômica/métodos , Anotação de Sequência Molecular/métodos , Sequenciamento de Cromatina por Imunoprecipitação , Código das Histonas , Humanos , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...